Wujian ZHANG Runde ZHOU Tsunehachi ISHITANI Ryota KASAI Toshio KONDO
The ring-like systolic array architecture described in this paper, based on a conventional one-dimensional systolic array architecture, was created through operator rescheduling based on the symmetry of data flow. This eliminated high-latency delay due to the stuffing of the array pipeline in the conventional architecture. The new architecture requires a memory bandwidth no greater than the conventional architecture does, but increases throughput and processor utilization while reducing power consumption.
Koyo NITTA Toshihiro MINAMI Toshio KONDO Takeshi OGURA
This paper describes a unique motion estimation and compensation (ME/MC) hardware architecture for a scene-adaptive algorithm. By statistically analyzing the characteristics of the scene being encoded and controlling the encoding parameters according to the scene, the quality of the decoded image can be enhanced. The most significant feature of the architecture is that the two modules for ME/MC can work independently. Since a time interval can be inserted between the operations of the two modules, a scene-adaptive algorithm can be implemented in the architecture. The ME/MC architecture is loaded on a single-chip MPEG-2 video encoder.
Tetsuya MATSUMURA Satoshi KUMAKI Hiroshi SEGAWA Kazuya ISHIHARA Atsuo HANAMI Yoshinori MATSUURA Stefan SCOTZNIOVSKY Hidehiro TAKATA Akira YAMADA Shu MURAYAMA Tetsuro WADA Hideo OHIRA Toshiaki SHIMADA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Koji TSUCHIHASHI Yasutaka HORIBA
A single-chip MPEG-2 video, audio, and system encoder LSI has been developed. It performs concurrent real-time processing of MPEG-2 422P@ML video encoding, 2-channel Dolby Digital or MPEG-1 audio encoding, and system encoding that generates a multiplexed transport stream (TS) or a program stream (PS). Advanced hybrid architecture, which combines a high performance VLIW media-processor D30V and hardwired video processing circuits, has been adopted to satisfy the demands of both high flexibility and enormous computational capability. A unified control scheme has been newly proposed that hierarchically manages adaptive task priority control over asynchronous video, audio, and system encoding processes in order to achieve real-time concurrent processing using a single D30V. Dual dedicated motion estimation cores consisting of a coarse ME core (CME) for wide range searches and a fine ME core (FME) for precise searches have been integrated to produce high picture quality while using a small amount of hardware. Adopting these features, a single-chip encoder has been fabricated using 0.25-micron 4-layer metal CMOS technology, and integrated into a 14.2 mm 14.2 mm die with 11 million transistors.
Ayako HARADA Shin-ichi HATTORI Tadashi KASEZAWA Hidenori SATO Tetsuya MATSUMURA Satoshi KUMAKI Kazuya ISHIHARA Hiroshi SEGAWA Atsuo HANAMI Yoshinori MATSUURA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Tokumichi MURAKAMI
An MPEG-2 422P@HL encoder chip set composed of a preprocessing LSI, an encoding LSI, and a motion estimation LSI is described. This chip set realizes a two-type scalability of picture resolution and quality, and executes a hierarchical coding control in the overall encoder system. Due to its scalable architecture, the chip set realizes a 422P@HL video encoder with multi-chip configuration. This single encoding LSI achieves 422P@ML video, audio, and system encoding in real time. It employs an advanced hybrid architecture with a 162 MHz media processor and dedicated video processing hardware. It also has dual communication ports for parallel processing with multi-chip configuration. Transferring of reconstructed data and macroblock characteristic data between neighboring encoder modules is executed via these ports. The preprocessing LSI is fabricated using 0.25 micron three-layer metal CMOS technology and integrates 560 K gates in an area of 12.0 mm 12.0 mm . The encoding LSI is fabricated using 0.25 micron four-layer metal CMOS technology and integrates 11 million transistors in an area of 14.2 mm 14.2 mm . The motion estimation LSI is fabricated using 0.35 micron three-layer metal CMOS technology. It integrates 1.9 million transistors in an area of 8.5 mm 8.5 mm . This chip set makes various system configurations possible and allows for a compact and cost-effective video encoder with high picture quality.
Ayuko TAKAGI Shogo MURAMATSU Hitoshi KIYA
In MPEG standard, motion estimation (ME) is used to eliminate the temporal redundancy of video frames. This ME is the most time-consuming task in the encoding of video sequences and is also the one using the most power. Using low-bit images can save power of ME and a conventional architecture fixed to a certain bit width is used for low-bit motion estimator. It is known that there is a trade-off between power and image quality. ME may be used in various situations, and the relation between demands for power or image quality will depend on those circumstances. We therefore developed an architecture for a low-bit motion estimator with adjustable power consumption. In this architecture, we can select the bit width for the input image and adjust the amount of power for ME. To evaluate its effectiveness, we designed the motion estimator by VHDL and used the synthesis results to estimate the performance.
Shoichi ARAKI Takashi MATSUOKA Naokazu YOKOYA Haruo TAKEMURA
This paper describes a new method for detection and tracking of moving objects from a moving camera image sequence using robust estimation and active contour models. We assume that the apparent background motion between two consecutive image frames can be approximated by affine transformation. In order to register the static background, we estimate affine transformation parameters using LMedS (Least Median of Squares) method which is a kind of robust estimator. Split-and-merge contour models are employed for tracking multiple moving objects. Image energy of contour models is defined based on the image which is obtained by subtracting the previous frame transformed with estimated affine parameters from the current frame. We have implemented the method on an image processing system which consists of DSP boards for real-time tracking of moving objects from a moving camera image sequence.
Tingrong ZHAO Masao YANAGISAWA Tatsuo OHTSUKI
This paper describes a highly performance scalable video coder. Wavelet transform is employed to decompose the video frame into different resolutions. Novel features of this coder are 1) a highly efficient multi-resolution motion estimation that requires minimum compuation and overhead motion information is embedded in this scheme; 2) the wavelet coefficients are organized in an extended zero tree (EZT) which is much more efficient than the simple zerotree. We show with experimental results that this video coder achieves good performances both in processing time and compression ratio when applied to typical test video sequences.
Jar-Ferr YANG Shu-Sheng HAO Wei-Yuan LU
In this paper, we propose fast block matching criteria to reduce the implementation complexity of motion estimation in VLSI video coders. Based on generalized quantization of pixel difference measures, the block matching criteria combined with bitmap exclusive-OR (XOR) concept can be realized by short length adders and a multi-input binary counter. The proposed approach can be treated as a generalization of the pixel difference classification (PDC) criterion. Simulation results show that the proposed block-matching criteria along with various block search algorithms achieve better results than the PDC and obtain nearly the same performance as the mean absolute difference (MAD) criterion. However, the complete gate-level synthesis of the proposed matching criterion is much less than those of the MAD and the PDC in the VLSI implementation.
Gaze detection is to locate the position on a monitor screen where a user is looking. In our work, we implement it with a computer vision system setting a single camera above a monitor and a user moves (rotates and/or translates) her face to gaze at a different position on the monitor. For our case, the user is requested not to move pupils of her eyes when she gazes at a different position on the monitor screen, though we are working on to relax this restriction. To detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images. From the movement of feature points detected in starting images, we can compute the initial 3D positions of those features by recursive estimation algorithm. Then, when a user moves her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be computed from 3D motion estimation by Iterative Extended Kalman Filter (IEKF) and affine transform. Finally, the gaze position on a monitor is computed from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Iterative Extended Kalman Filter (IEKF) are about 2.8 degrees and 1.21 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.06 inches of RMS error.
Shogo MURAMATSU Hitoshi KIYA Akihiko YAMADA
In this paper, an algorithm of the median-cut quantization (MCQ) is proposed. MCQ is the technique that reduces multi-valued samples to binary-valued ones by adaptively taking the median value as the threshold. In this work, the search process of the median value is derived from the quick-sort algorithm. The proposed algorithm searches the median value bit by bit, and samples are quantized during the search process. Firstly, the bit-serial procedure is shown, and then it is modified to the bit-parallel procedure. The extension to the multi-level quantization is also discussed. Since the proposed algorithm is based on bit operations, it is suitable for hardware implementation. Thus, its hardware architecture is also proposed. To verify the significance, for the application to the motion estimation, the performance is estimated from the synthesis result of the VHDL model.
Mon-Chau SHIE Wen-Hsien FANG Kuo-Jui HUNG Feipei LAI
This paper presents a simulated annealing (SA)-based algorithm for fast and robust block motion estimation. To reduce computational complexity, the existing fast search algorithms move iteratively toward the winning point based only on a finite set of checking points in every stage. Despite the efficiency of these algorithms, the search process is easily trapped into local minima, especially for high activity image sequences. To overcome this difficulty, the new algorithm uses two sets of checking points in every search stage and invokes the SA to choose the appropriate one. The employment of the SA provides the search a mechanism of being able to move out of local minima so that the new algorithm is less susceptible to such a dilemma. In addition, two schemes are employed to further enhance the performance of the algorithm. First, a set of initial checking points which exploit high correlations among the motion vectors of the temporally and spatially adjacent blocks are used. Second, an alternating search strategy is addressed to visit more points without increasing computations. Simulation results show that the new algorithm offers superior performance with lower computational complexity compared to previous works in various scenarios.
Kang Ryoung PARK Si Wook NAM Min Suk LEE Jaihie KIM
This paper describes a new method for detecting the gaze position of a user on a monitor from monocular images. In order to detect the gaze position, we extract facial features (both eyes, nostrils and lip corners) automatically in 2D camera images and estimate the 3D depth information and the initial 3D positions of those features by recursive estimation algorithm in starting images. Then, when a user moves his/her head in order to gaze at one position on a monitor, the moved 3D positions of those features can be estimated from 3D motion estimation by Extended Kalman Filter (EKF) and affine transform. Finally, the gaze position on a monitor is calculated from the normal vector of the plane determined by those moved 3D positions of features. Especially, in order to obtain the exact 3D depth and positions of initial feature points, we unify three coordinate systems (face, monitor and camera coordinate system) based on perspective transformation. As experimental results, the 3D depth and the position estimation error of initial feature points, which is the RMS error between the estimated initial 3D feature positions and the real positions (measured by 3D position tracker sensor) is about 1.28 cm (0.75 cm in X axis, 0.85 cm in Y axis, 0.6 cm in Z axis) and the 3D motion estimation errors of feature points by Extended Kalman Filter (EKF) are about 3.6 degrees and 1.4 cm in rotation and translation, respectively. From that, we can obtain the gaze position on a monitor (17 inches) and the gaze position accuracy between the calculated positions and the real ones is about 2.1 inches of RMS error.
A new hardware algorithm for the block matching video motion estimation is presented. The algorithm works in the full-search fashion but unlike the Full-Search Block Matching Algorithm (FSBMA) it adjusts the number of computations dynamically to variable picture contents. Due to incorporated mechanism of data-driven thresholding, the proposed algorithm performs as four times as less operations comparing to the FSBMA while maintaining the same quality of results. Its hardware implementation is simple and compact. A supportive hardware design as well as simulation results on benchmarks are outlined.
Chi-Hsi SU Hsueh-Ming HANG David W. LIN
A global motion parameter estimation method is proposed. The method can be used to segment an image sequence into regions of different moving objects. For any two pixels belonging to the same moving object, their associated global motion components have a fixed relationship from the projection geometry of camera imaging. Therefore, by examining the measured motion vectors we are able to group pixels into objects and, at the same time, identify some global motion information. In the presence of camera zoom, the object shape is distorted and conventional translational motion estimation may not yield accurate motion modeling. A deformable block motion estimation scheme is thus proposed to estimate the local motion of an object in this situation. Some simulation results are reported. For an artificially generated sequence containing only zoom activity, we find that the maximum estimation error in the zoom factor is about 2. 8 %. Rather good moving object segmentation results are obtained using the proposed object local motion estimation method after zoom extraction. The deformable block motion compensation is also seen to outperform conventional translational block motion compensation for video material containing zoom activity.
Hitoshi KIYA Jun FURUKAWA Yoshihiro NOGUCHI
We propose a motion estimation algorithm using less gray level images, which are composed of bits pixels lower than 8 bits pixels. Threshold values for generating low bits pixels from 8 bits pixels are simply determined as median values of pixels in a macro block. The proposed algorithm reduces the computational complexity of motion estimation at less expense of video quality. Moreover, median cut quantization can be applied to multilevel images and combined with a lot of fast algorithms to obtain more effective algorithms.
In this paper, we present two fast motion estimation techniques with adaptive variable search range using spatial and temporal correlation of moving pictures respectively. The first technique uses a frame difference between two adjacent frames which is used as a criterion for deciding search window size. The second one uses deviation between the past and the predicted current frame motion vectors which is also used as a criterion for deciding search window size. Simulation results show that these methods reduce the number of checking points while keeping almost the same image quality as that of full search method.
In this paper, a new hierarchical block matching algorithm using mean and difference pyramids is presented. The detection of motion vectors at each level of the pyramid is accomplished by selectively eliminating the candidate motion vectors that cannot provide the best match at the next lower level. The remaining motion vectors at each level are propagated and used as the initial motion vectors at the next lower level. Therefore, the possibility of falling into local minima can be significantly reduced. The simulation results show that the proposed method has excellent performance with reduced computational complexity.
Yankang WANG Yanqun WANG Hideo KURODA
This paper presents a novel approach to pixel decimation for motion estimation in video coding. Early techniques of pixel decimation use regular pixel patterns to evaluate matching criterion. Recent techniques use adaptive pixel patterns and have achieved better efficiency. However, these adaptive techniques require an initial division of a block into a set of uniform regions and therefore are only locally-adaptive in essence. In this paper, we present a globally-adaptive scheme for pixel decimation, in which no regions are fixed at the beginning and pixels are selected only if they have features important to the determination of a match. The experiment results show that when no more than 40 pixels are selected out of a 1616 block, this approach achieves a better search accuracy by 13-22% than the previous locally-adaptive methods which also use features.
Li JIANG Dongju LI Shintaro HABA Chawalit HONSAWEK Hiroaki KUNIEDA
In this paper, a dedicated hardware design for motion estimation LSI of MPEG2 is presented. Combining our bits truncation adaptive pyramid (BTAP) algorithm with Window-MSPA architecture, the hardware cost is tremendously reduced without PSNR performance degradation for mean pyramid algorithm. The core of the test chip working at 83 MHz, performs a search range of 67 for image size of 1920 1152 and achieves video rate of 60 field/s. It can be used for HDTV purpose. The chip size is 4. 8 mm 4. 8 mm with 0. 5u 2-level metal CMOS technology. The result in this paper shows our promising future to realize one chip HDTV MPEG2 encoder.
Gen FUJITA Takao ONOYE Isao SHIRAKAWA
A VLSI architecture of a motion estimator is described dedicatedly for the H. 263 low bitrate video coding. Adopting an efficient hierarchical search algorithm, a new motion estimator yields high quality vectors with small area occupancy and at a low operation frequency. A one-dimensional PE (Processing Element) array is devised to be tuned to the H. 263 encoding, which treats both the advanced prediction mode and the PB-frame mode. The proposed motion estimation core is integrated in 1. 55 mm2 by using 0. 35 µm CMOS 3LM technology, which operates at 15 MHz, and hence enables the realtime motion estimation of QCIF pictures.